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Field of the Invention: 

This invention relates to a software driven emulator comprised of 
a large plurality of processors operating in parallel with each 
capable of performing a logic function under program control^ and 
5 more particularly to a space efficient data and input memory 
stack shared by a cluster of processor. 

Trademarks : 

S/390 and IBM are registered trademarks of International Business 
Machines Corporation^ Armonk, New York, U.S.A. and Lotus is a 
10 registered trademark of its subsidiary Lotus Development 

Corporation, an independent subsidiary of International Business 
Machines Corporation, Armonk, NY. Other names may be registered 
trademarks or product names of International Business Machines 
Corporation or other companies. 

15 Background: 

The usefulness of software driven emulators has increased 
enormously with growth in the complexity of integrated circuits. 
Basically, an emulation engine operates to mimic the logical 
design of a set of one or more integrated circuit chips. The 

20 emulation of these chips in terms of their logical design is 
highly desirable for several reasons. The utilization of 
emulation engines has also grown up with and around the 
corresponding utilization of design automation tools for the 
construction and design of integrated circuit chip devices. In 

25 particular, as part of the input for the design automation 
process, logic descriptions of the desired circuit chip functions 
are provided. The existence of such software tools for 

processing these descriptions in the design process is well 
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suited to the utilization of emulation engines which are 
electrically configured to duplicate the same logic function that 
is provided by a design automation tool. 

Utilization of emulation devices permits testing and verification 
5 via electrical circuits of logic designs before these designs are 
committed to a so-called ''silicon foundry'' for manufacture. The 
input to such foundries is the functional logic description 
required for the chip and its output is initially a set of 
photolithographic masks which are then used in the manufacture of 
10 the desired electrical circuit chip device. Verifying that logic 
designs are correct in the early stage of chip manufacturing 
eliminates the need for costly and time-consuming second passes 
through a silicon foundry. 

Another advantage of emulation systems is that they provide a 
15 device that makes possible the early validation of software meant 
to operate the emulated chips. Thus^ software can be designed, 
evaluated and tested well before the time when actual circuit 
chips become available. Additionally, emulation systems can also 
operate as simulator-accelerator devices thus providing a 
20 high-speed simulation platform. 

Emulation engines of the type contemplated by this invention 
contain an interconnected array of emulation processors (EP) , 
Each emulation processor (hereinafter, also sometimes simply 
referred to as ''processor") can be programmed to evaluate logic 

25 function (for example, AND, OR XOR, NOT, NOR, NAND, etc.). The 
program driven processors operate together as an interconnected 
unit, emulate the entire desired logic design. However, as 
integrated circuit designs grow in size, more emulation 
processors are required to accomplish the emulation task. An 

30 aim, therefore, is to increase the capacity of emulation engines 
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in order to meet the increasingly difficult task of emulating 
more and more complex circuits and logic functions by increasing 
the number of emulation processors in each of its modules . 

For purposes of better understanding the structure and operation 
5 of emulation devices generally, and this invention particularly, 
United States Patent No. 5,551,013 and patent application Serial 
No. 09/373,125 filed August 12, 1999 and assigned to the 
assignee of this application are hereby incorporated herein by 
reference . 

10 Patent No. 5,551,013 shows an emulation module having multiple 
(e.g. 64) processors. All processors within the module are 
identical. The sequencer and the interconnection network occur 
only once in a module. The control stores hold a program created 
by an emulation compiler for a specified processor. The stack 

15 memory, which is the subject of this invention, holds data and 
inputs previously generated and is addressed by fields in a 
corresponding control word to locate the bits for input to the 
processor logic element. During each step of the sequencer an 
emulation processor emulates a logic function according to the 

20 emulation program. The data flow control interprets the current 
control word to route and latch data within the processor , The 
node-bit-out signal from a specified processor is presented to 
the interconnection network where it is distributed to each of 
the muliplexors (one for each processor) of the module. The node 

25 address field in the control word allows a specified processor to 
select for its node-bit-in signal the node-bit-out signal from 
any of the processors within its module. The node bit is stored 
in the input stack on every step. During any operation the 
node-bit-out signal of a specified processor may be accessed by 

30 none, one, or all of the processors within the module. 
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Data routing within each processor's data flow and through the 
interconnection network occurs independently of and overlaps the 
execution of the logic emulation function in each processor. 
Each control store stores control words executed sequentially 
5 under control of the sequencer and program steps in the 
associated module. Each revolution of the sequencer causes the 
step value to advance from zero to a predetermined maximum value 
and corresponds to one target clock cycle for the emulated 
design. A control word in the control store is simultaneously 
10 selected during each step of the sequencer. A logic function 
operation is defined by each control word. 

Each of these emulation processors has an execution unit for 
processing multiple types of logic gate functions. Each 
emulation processor switches from a specified one logic gate 

15 function to a next logic gate function in a switched-emulation 
sequence of different gate functions. The switched-emulation 
sequence of each of the processors thus can emulate a subset of 
gates in a hardware arrangement in which gates are of any type 
that the emulation processors functionally represent for a 

20 sequence of clock cycles . The processors are coupled by a like 
number of multiplexors having outputs respectively connected to 
the emulation processors of a module and having inputs 
respectively connected to each of the other emulation processors. 
The bus connected to the multiplexors enables an output from any 

25 emulation processor to be transferred to an input of any other of 
the emulation processors. In accordance with the teachings of 
the pending application^ the basic design of the 5,551^013 patent 
is improved by interconnecting processors into clusters. With 
the processors interconnected in clusters^, the evaluation phases 

30 can be cascaded, and all processors in a cluster perform the 
setup and storing of results in parallel. This setup includes 
routing of the data through multiple evaluation units for the 
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evaluation phase. For most efficient operation, the input stack 
and data stack of each processor must be stored in shared memory 
within each cluster. Then, all processors perform the storage 
phase, again in parallel. The net result is multiple cascaded 
5 evaluations performed in a single emulation step. Every 
processor in a cluster can access the input and data stacks of 
every other processor in the cluster and the less space on each 
module chip for the functions that support the processor 
operation, particularly the memory functions. While the 

10 emulation processor described in the co-pending application has 
obvious advantages, as more and more components are placed on a 
single ET 4 chip, the availability of real estate on the chip 
becomes more and more a factor in the successful realization of 
an advanced emulation processor design. 



15 Summary of the Invention: 

An object of this invention is the provision of an emulation 
processor cluster input and data stack memory that makes more 
efficient use of the silicon real estate as compared with prior 
art emulation processor cluster designs. 

20 Briefly, this invention contemplates the provision of an emulator 
processor cluster in which the read ports of a shared input and 
data memory stack are time multiplexed to serve more than one 
processor. In an exemplary embodiment of the invention, a 256 x 
8 memory array serves as the shared memory for four processors in 

25 a cluster- Two read ports are time multiplexed among the four 
processors in the cluster. On one read cycle, data from the two 
read ports is coupled to two processors. The next read cycle 
reads data from the same two ports to the remaining two 
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processors. In the preferred embodiment, the memory operates at 
twice the system clock speed so that overall emulation process 
execution time is not effected. 

Brief Description of the Drawings: 

5 The foregoing and other features and advantages of the invention 
will be described with reference to the accompanying drawings. 
In the drawings^ like reference numbers generally indicate 
identical, functionally similar, and/or structurally similar 
elements. Also in the figures, the left most digit of each 
10 reference number corresponds to the figure in which the reference 
number is first used. 

Figure 1 is a block diagram of a four processor cluster of the 
type more fully described in copending application serial no. 
09/373,125, and included here to illustrate the technology state 
15 from which this invention departs. 

Figure 2 is a block diagram of the cluster of Figure 1 (shown in 
less detail than in Figure 1) with multiplexed read port input 
and data memory in accordance with the teachings of this 
invention. 

20 Detailed Description of the Invention: 

Referring now to Figure 1, as described more completely in 
application serial no. 09/373,125, each cluster of four 
processors (ProcessorO, Processorl, Processor2 and Processor3) 
has a shared data and input memory stack to which and from which 
25 each processor in the cluster can write and read. In this 
exemplary emulation processor, the memory stack has 256 
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addressable eight- bit words. Each processor has four read ports 
for reading an eight-bit word from the data memory comprised of 
address inputs RAO; RAl; RA2; and RA3 and corresponding inputs to 
the four eight-to-one multiplexers whose select inputs are TACO; 
5 TACl; TAC2 and TAC3 respectively. On each clock cycle, the inputs 
of the eight-to-one multiplexer of each of the four processors 
receives an eight-bit word from the memory. While generally 
satisfactory, there are a large number of processors (e.g. 64) 
and a correspondingly large number of memory stacks on a single 
10 ET 4 emulator chip. Silicon real estate is in short supply and 
the stack memory output ports take up a lot of area on the chip. 

Referring now to Figure 2, in accordance the teachings of the 
invention, the read ports of the data and input memory stack are 

15 time multiplexed and the memory has only sets of two read ports 
to serve four processors, in this specific embodiment. So that 
the overall emulation process is not slowed, a memory stack read 
is carried out at twice the system clock speed. On one memory 
cycle, multiplexer 10 connects set A read address ports RAO, RAl, 

20 RA2 and RA3 to address inputs for processor PO and the 
corresponding four outputs as inputs to the four eight-to-one 
multiplexers of processor PO. Similarly, on this same memory 
cycle, multiplexer 12 connects set B read address ports to 
address inputs for processor PI and the corresponding four 

25 outputs as inputs to the four eight-to-one multiplexers of 
processor PI. On the next clock cycle, multiplexer 10 connects 
the set A read ports to address inputs for processor P2 and the 
corresponding four outputs as inputs to the four eight-to-one 
multiplexers of processor P2 . Also multiplexer 12 connects the 

30 set B read ports to address inputs for processor P3 and the 
corresponding four outputs as inputs to the four eight-to-one 
multiplexers of processor P3. 
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It will be appreciated that the teachings of this invention can 
be readily applied where there are eight processors in the 
cluster. Eight processors in a cluster can be time multiplexed 
to share four read ports, while the read clock for the memory 
5 operates at two times the system clock. If the memory clock is 
operated at four times the system clock, the memory can be 
multiplexed so that two ports can serve all eight processors 
without slowing the emulation process. 

While the preferred embodiment to the invention has been 
10 described, it will be understood that those skilled in the art, 
both now and in the future, may make various improvements and 
enhancements which fall within the scope of the claims which 
follow. These claims should be construed to maintain the proper 
protection for the invention first described. 
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What is claimed is: 



1 Claim 1. A software driven emulation engine comprising, in 

2 combination: 

3 a plurality of modules, each of said modules 

4 including a plurality of processors organized in clusters, 

5 each cluster comprised of a plurality of processors that 

6 access a data memory; 

7 a time division multiplexer coupling a set of read 

8 ports of said data memory to a set of read addresses of one 

9 processor of said cluster during one read cycle of said data 

10 memory and coupling said set of read ports of said data 

11 memory to a set of read addresses of another processor of 

12 said cluster during the next read cycle of said memory. 

1 Claim 2. A software driven emulation engine as in claim 1 

2 wherein said data memory operating clock rate for read 

3 operations is twice the operating clock rate of said module. 

1 Claim 3. A software driven emulation engine comprising in 

2 combination: 

3 a plurality of modules, each of said modules 

4 including a plurality of processors organized in clusters, 

5 each cluster comprised of a plurality of processors that 

6 access a data memory; 

7 a first time division multiplexer coupling a first 

8 set of read ports of said data memory to a set of read 

9 addresses of a first processor of said cluster during one 

10 read cycle of said data memory and coupling said first set 

11 of read ports of said data memory to a set of read addresses 

12 of a second processor in said cluster during the next read 

13 cycle of said memory; 

14 a second time division multiplexer coupling a 

15 second set of read ports of said data memory to a set of 
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16 
17 
18 
19 
20 

1 
2 
3 



read addresses of a third processor of said cluster during 
said one read cycle of said data memory and coupling said 
second set of read ports of a set of read addresses of a 
fourth processor in said cluster during said next read cycle 
of said memory. 

Claim 4 . A software driven emulation engine as in claim 3 
wherein said data memory operating clock rate for read 
operations is twice the operating clock rate of said module. 
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TITLE: HIGH SPEED SOFTWARE DRIVEN EMULATOR COMPRISED OF A 
PLURALITY OF EMULATION PROCESSORS WITH IMPROVED 
MULTIPLEXED DATA MEMORY 



Abstract of the Disclosure: 



5 In an emulator processor cluster, the read ports of a shared 

input and data memory stack are time multiplexed to serve more 
than one processor. In an exemplary embodiment of the invention, 
a 256 X 8 memory array serves as the shared memory for four 
processors in a cluster. Two read ports are time multiplexed 

10 among the four processors in the cluster. On one read cycle, 
data from the two read ports is coupled to two processors. The 
next read cycle reads data from the same two ports to the 
remaining two processors. In the preferred embodiment, the 
memory operates at twice the system clock speed so that overall 

15 emulation process execution time is not effected. 
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Full Name of fourth joint invento^j 
SignatL 



Helmut ROTH 



signature: 



Date: 



Residence: 69 Dakota Drive, Hopewell Junction, New York 12533 

Citizenship: German 

Post Office Address: Same as above 
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Full Name of fifth joint inventor: 



Peter TANNENBAUM 



Signature : 



Date: 



Residence: Woodstock^ New York 
Citizenship: United States 

Post Office Address: P. 0. Box 172, Woodstock, New York 1249S 




Full Name of sixth joint inventor: Lawrence A. THOMAS 



Signatu rel 



Date : 



Residence: 100 Pleasant Ridge Drive, West Hurley, New York 12491 
Citizenship: United States 
Post Office Address: Same as above 




Full Name of seventh joir>^ inventor: 



Norton J. TOMASSETTI 



Signature : 



Date 



ResV^ence: Parsell Street, Kingston, New York 12401 

Ci/izenshi^: United States 

3t Of^ce Address: Same as above 




S///2000 



